AITopics | monte carlo policy evaluation

Collaborating Authors

monte carlo policy evaluation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach

Neural Information Processing SystemsDec-24-2025, 06:53:47 GMT

Policy evaluation via Monte Carlo (MC) simulation is at the core of many MC Reinforcement Learning (RL) algorithms (e.g., policy gradient methods). In this context, the designer of the learning system specifies an interaction budget that the agent usually spends by collecting trajectories of within a simulator. However, is this data collection strategy the best option? To answer this question, in this paper, we consider as quality index the variance of an unbiased policy return estimator that uses trajectories of different lengths, i.e., . We first derive a closed-form expression of this variance that clearly shows the sub-optimality of the fixed-length trajectory schedule. Furthermore, it suggests that adaptive data collection strategies that spend the available budget sequentially might be able to allocate a larger portion of transitions in timesteps in which more accurate sampling is required to reduce the variance of the final estimate.

monte carlo policy evaluation, name change, truncating trajectory, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

Add feedback

Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach

Neural Information Processing SystemsOct-10-2024, 14:59:48 GMT

Policy evaluation via Monte Carlo (MC) simulation is at the core of many MC Reinforcement Learning (RL) algorithms (e.g., policy gradient methods). In this context, the designer of the learning system specifies an interaction budget that the agent usually spends by collecting trajectories of fixed length within a simulator. However, is this data collection strategy the best option? To answer this question, in this paper, we consider as quality index the variance of an unbiased policy return estimator that uses trajectories of different lengths, i.e., truncated. We first derive a closed-form expression of this variance that clearly shows the sub-optimality of the fixed-length trajectory schedule.

adaptive approach, monte carlo policy evaluation, truncating trajectory, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

On The Convergence Of Policy Iteration-Based Reinforcement Learning With Monte Carlo Policy Evaluation

Winnicki, Anna, Srikant, R.

arXiv.org Artificial IntelligenceFeb-28-2023

A common technique in reinforcement learning is to evaluate the value function from Monte Carlo simulations of a given policy, and use the estimated value function to obtain a new policy which is greedy with respect to the estimated value function. A well-known longstanding open problem in this context is to prove the convergence of such a scheme when the value function of a policy is estimated from data collected from a single sample path obtained from implementing the policy (see page 99 of [Sutton and Barto, 2018], page 8 of [Tsitsiklis, 2002]). We present a solution to the open problem by showing that a first-visit version of such a policy iteration scheme indeed converges to the optimal policy provided that the policy improvement step uses lookahead [Silver et al., 2016, Mnih et al., 2016, Silver et al., 2017b] rather than a simple greedy policy improvement. We provide results both for the original open problem in the tabular setting and also present extensions to the function approximation setting, where we show that the policy resulting from the algorithm performs close to the optimal policy within a function approximation error.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2301.09709

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback